Showee’s Winning Job List

[Quantified Self]


Hi there! My name is Sho’ and I am a Data Analyst.
Welcome to my analysis!


I am aspiring to land a job as a Data Analyst, prospecting to become a CDO down the line. I have experience in day-to-day business analysis in various field (finance, education, legal), and hands-on skills with in-depth methodology from academics (MBA, MS) that satisfy the data analyst job descriptions anywhere. However, the recent hiring process that the major corporations have deployed has been such a struggle for me and no luck as of yet whatsoever. In full shameless disclosure, my job applications count more than 200 this year, without being offered an ideal full-time job. Yes, I am at where my friends are worried, frankly.

Good side is, I have now enough data to quantify for analysis while I have recorded the progress, with 300+ observations from this desperate situation. As a data analyst, I found my task here is to build disruptive strategies out of this serendipitous spun-offs, and construct my next moves. Data analysts are never bored as long as data are there (which, as you know will never deplete).

Ok, enough for introduction. Let’s get it started and analyze for my next moves.

My job application data can be found at >>> https://docs.google.com/spreadsheets/d/1ug6rRgsNRyvToBPFtaATSiPkmAgOPLVlrBx_tJGjih8/edit?usp=sharing and be imported per below:

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("waffle")) install.packages("waffle")
if (!require("googlesheets4")) install.packages("googlesheets4")
if (!require("wordcloud")) install.packages("wordcloud")
if (!require("lubridate")) install.packages("lubridate")
if (!require("knitr")) install.packages("knitr")
if (!require("ggbeeswarm")) install.packages("ggbeeswarm")
if (!require("plotly")) install.packages("plotly")

library(tidyverse)
library(waffle)
library(googlesheets4)
library(wordcloud)
library(lubridate)
library(knitr)
library(ggbeeswarm)
library(plotly)
# library(RColorBrewer)
gs4_deauth()

job_list <- read_sheet("https://docs.google.com/spreadsheets/d/1ug6rRgsNRyvToBPFtaATSiPkmAgOPLVlrBx_tJGjih8/edit#gid=0")
job_list <- job_list %>%
  filter(Title != "NA")


The variable job_list2022 is used exclusively for 2022 applications;

job_list2022 <- job_list %>%
  filter(year(Applied) > 2021)


*number of applications this year (live);

kable(nrow(job_list2022), col.names = "Total Number of Job Applications in 2022", align = "cc")
Total Number of Job Applications in 2022
217

To be honest, I feel like I applied for a lot more, I guess it’s because the energy I spend for each application surpasses any other laborious work for me.


Wordcloud from Company Names


Job Titles I Applied For

First, let’s see what job titles I applied for. As I said, my ideal title is Data Analyst but it does not necessarily mean it has to be in a data team or department, as long as the position allows me to access organizational data and analyze them to help the organization draw business solutions. Thus the position types are not just limited to “Data Analyst.”

I sorted the positions that I applied for into 8 categories, i.e.

  • Data Analyst,
  • Business Analyst,
  • Financial Analyst,
  • Research Analyst,
  • Project Manager,
  • Procurement,
  • Analyst (other),
  • Data (other), and
  • Other.

For reference, Procurement was my expertise before I started my academic journey, and I’d also contribute my knowledge if there are full of DATA TO ANALYZE!

Here’s to visualize which positions encouraged me to apply.

job_waffle <- job_list %>%
  group_by(Position) %>%
  summarize(position_count = n()) %>%
  mutate(ratio = ceiling(position_count / sum(position_count) * 100)) %>%
  arrange(desc(position_count))

positions <- job_waffle$ratio
names(positions) <- job_waffle$Position



waffle(positions, rows = 5, colors = c("navy", "deepskyblue1", "darkturquoise", "blue2", "cyan1", "darkblue", "cadetblue2", "deepskyblue3")) +
  labs(title = "portion of applied positions")

As easy as apple pie, obviously I applied for Data Analyst jobs the most frequently, followed by Business Analyst jobs. But it’s also true that as I delve into data analysis roles, I became more interested in business analysis using my business insight from MBA. I would like to compare these two to see if application numbers have increased for Business Analyst roles.

da_ba <- job_list2022 %>%
   group_by(Position, Applied) %>%
   filter(Position %in% c("Data Analyst", "Business Analyst")) %>%
   summarize(Applications = n()) %>%
   mutate(cum_app = cumsum(Applications))

ggplotly(ggplot(da_ba, aes(x = Applied, y = cum_app, color = Position)) +
  geom_line(size = 2) +
  labs(title = "Cumulated Number of Applications", subtitle = "Data Analyst vs. Business Analyst", y = "Cumulated Number of Applications", x = "Applied Date (2022)") +
  scale_color_manual(values = c("Data Analyst" = "navy", "Business Analyst" = "deepskyblue1")) +
  theme_minimal())
# ggplot(da_ba, aes(x = Applied, y = cum_app, color = Position)) +
#   geom_line(size = 2) +
#   labs(title = "Cumulated Number of Applications", subtitle = "Data Analyst vs. Business Analyst", y = "Cumulated Number of Applications", x = "Applied Date (2022)") +
#   scale_color_manual(values = c("Data Analyst" = "navy", "Business Analyst" = "deepskyblue1"))

Ok, from the beginning of this year to now, it seems it’s almost stable that the total number of Business Analyst applications has always been half of the Data Analyst applications. As you can see though, right before July, my Business Analyst applications almost reached the number of Data Analyst applications and that may be why I thought I have been attracted to Business Analyst role descriptions more. But the data shows the consistent lower number of applications for Business Analyst. Good example of human perception at one point does not always match the reality, isn’t it?


Overall, how many applications I sent on my designated search day?

app <- job_list2022 %>%
  select(Position, Applied) %>%
  group_by(Applied) %>%
  mutate(Applications = as.numeric('Applied')) %>%
  summarize(Applications = n())

ggplot(app, aes(Applications)) +
  geom_histogram(binwidth = 1, fill = "cornflowerblue") +
  labs(x = "Number of Applications", y = "Days") +
  theme_minimal() +
  labs(title = "Number of Applications Submitted in a Day")

# summ <- as.data.frame(summary(app$Applications))

Job Search Platforms

Initially, I was randomly applying for whatever I see with “Data” and “Analyst” words in it, but at some point, I started receiving unsolicited mails and calls from Indian recruiters who are obviously not in this country. I was naive and sent my resume a couple of times, then it was circulated within their network. I found out they create the job descriptions to match my profile to obtain my resume with personal information. So I narrowed down the options to mere 12 platforms to send my resume through, and they are;

  • LinkedIn,
  • Indeed,
  • Glassdoor,
  • WayUp,
  • HandShake (university resource),
  • Company Site (by each),
  • DataCamp,
  • DVS (professional community job board),
  • DAA (professional community job board),
  • Google Coursera (for Google/Coursera certificate holders),
  • CUNY,
  • NYC (city employment),
  • Referral.


I’d first like to see which platforms are the ones I spent most of my time on.

pf_raw <- job_list2022 %>%
  group_by(pf = `Found on`) %>%
  summarize(pf_count = n()) %>%
  arrange(desc(pf_count))


  
ggplot(pf_raw, aes(x = reorder(pf, pf_count), y = pf_count, fill = pf)) +
  geom_segment(aes(xend = pf, yend = 0)) +
  geom_point(show.legend = F, size = 4, aes(color = pf))+
  geom_col(show.legend = F) +
  coord_flip() +
  scale_fill_manual(values = c("LinkedIn" = "#0077B5", "Indeed" = "#003A9B", "Referral" = "darkorchid4", "DAA" = "#BF2E1A")) +
  scale_color_manual(values = c("LinkedIn" = "#0077B5", "Indeed" = "#003A9B", "Referral" = "darkorchid4", "DAA" = "#BF2E1A")) +
  labs(title = "Platforms Used to Find Each Open Position", subtitle = "", x = NULL, y = "Number of Applications Submitted", caption = "Application submission platforms may differ") +
  theme_minimal()


How Many Ghosts?

number of no updates more than 90 days

no_update <- c(which(is.na(job_list2022$'last update')))


no_res <- job_list2022 %>%
  filter(row_number() %in% no_update & Status == "Awaiting Response") %>%
  summarize(nr = n()) %>%
  mutate("% of No Responses" = nr/nrow(job_list2022)*100) %>%
  select("No Response" = nr, "% of No Responses")

ghosts <- job_list2022 %>%
  filter((Applied <= (Sys.Date() - 90)) & row_number() %in% no_update & Status == "Awaiting Response") %>%
  summarize(Ghosted = n()) %>%
  mutate("% of Ghosts" = Ghosted/nrow(job_list2022)*100) %>%
  select(Ghosted, "% of Ghosts")



kable(cbind(no_res, ghosts), col.names = c("Total No Responses", "% of No Responses", "Ghosted (no response more than 90 days)", "% of Ghosts"), align = "cc")
Total No Responses % of No Responses Ghosted (no response more than 90 days) % of Ghosts
111 51.15207 78 35.9447


ggplot(job_list2022, aes(x = "", fill = Status)) +
  geom_bar() +
  coord_polar(theta = "y", direction = 1) +
  theme_void() +
  ggtitle('Current Application Status') +
  scale_fill_brewer(palette = "Blues")

for which platforms applications were responded responses include rejection mails.

with_update <- c(which(!is.na(job_list2022$'last update')))
length(with_update)
## [1] 88
job_list2022$Platform <- as.factor(job_list2022$'Found on')

with_res <- job_list2022 %>%
  filter(row_number() %in% with_update) %>%
  group_by(pf = `Found on`) %>%
  summarize(response_count = n()) %>%
  arrange(desc(response_count))

platform_res <- pf_raw %>%
  left_join(with_res, by = "pf") %>%
  mutate("%" = response_count/pf_count*100, "no_res %" = 100 - response_count/pf_count*100)

res_long <- platform_res %>%
  filter(!is.na(response_count)) %>%
  pivot_longer(c("pf_count", "response_count"), names_to = "count_by", values_to = "count") %>%
  select(pf, count_by, count)

ggplot(res_long, aes(x = reorder(pf, count), y = count, fill = count_by)) +
    geom_col(position = position_dodge(width = -0.3)) +
  coord_flip() +
  scale_fill_manual(values = c(response_count = "navy", pf_count = "cornflowerblue")) +
  theme_minimal()

res_long_pct <- platform_res %>%
  filter(!is.na(response_count)) %>%
  pivot_longer(c("%", "no_res %"), names_to = "response", values_to = "pct") %>%
  select(pf, response, pct)

ggplot(res_long_pct, aes(x = "", y = pct, fill = reorder(response, pct))) +
  geom_bar(stat="identity", width=1) +
  coord_polar(theta = "y", direction = 1) +
  theme_void() +
  theme(legend.position = "bottom") +
  facet_wrap(~ pf, ncol = 5) +
  ggtitle('Response Rate by Platform') +
  scale_fill_manual(values = c("%" = "navy", "no_res %" = "azure2")) +
  labs(subtitle = "", legend = "", caption = "Responses include rejections")


How Long Do They Need?

Appendix

[Bloopers]

Just for the record, I am listing the visualizations that were not deployed in my above job analysis.


Unaesthetic, Data Not Suitable

These are just ugly, I found box plot should be used for bigger data.


Misleading

I wanted to emphasize the steep curve of applications for Business Analyst but the trendline showed otherwise. Withdrawn.


Misconception

This cute lollipop chart maybe suitable to display somewhat duration instead of volume. Howver, I experimentally used it behind the barchart to emphasize the growing direction.